coin flip
Appendix
In this section we motivate the design choices and inductive biases that we encode into our neural encoder network e, which is the network that is used to model the relative accuracies of the weak supervision sources ฮป. Recall that we model the probability of a particular sample x X having the class label y Y = {1,...,C}as Pฮธ(y|ฮป) = softmax(s)yP(y), (4) s = ฮธ(ฮป,x)Tฮป RC . Connection to prior PGM models We now motivate this choice by deriving a less expressive variant of it from the standard Markov Random Field (MRF) used in the related work. If we view the attention scores ฮธ(ฮป,x) Rm, that assign sample-dependent accuracies to each labeling function, as sample-independent parameters ฮธ1 and, by that, drop the features from the equation - as is done in the related work [30, 32, 19, 11] - we can rewrite Eq. 4 as exp ฮธT1 1 {ฮป = y} P We can recognize Pฮธ as a distribution from the exponential familiy, and more specifically as a pairwise MRF, or factor graph, with canonical parameters ฮธ = (ฮธ1,ฮธ2) and corresponding sufficient statistics, or factors, ฯ(ฮป,y) = (ฯ1(ฮป,y),ฯ2(ฮป)), as well as the log partition function Zฮธ. The accuracy factors and parameters ฯ1,ฮธ1 are the core component of this model and sometimes take the form ฯ1(ฮปy) = ฮปy in binary models as in [30, 19, 11]. The label-independent factors ฯ2(ฮป) have, as can be seen from the derivation above, no direct influence on the latent label posterior, but are often used to model labeling propensities 1 {ฮป 6= 0}and correlation dependencies 1 {ฮปi = ฮปj}, which can be important for PGM parameter learning, but are susceptible to misspecifications [39, 11, 8].
Appendix A Posterior Reparameterization
In this section we motivate the design choices and inductive biases that we encode into our neural encoder network e, which is the network that is used to model the relative accuracies of the weak supervision sources ฮป. Recall that we model the probability of a particular sample x X having the class label y Y = {1,..., C} as P Our own parameterization therefore is a more expressive variant of these latent-variable PGM models, where we are able to assign LF accuracies on a sample-by-sample basis. Furthermore, our neural encoder network outputs them as a function of the LF outputs and features, and is expected to learn the easy to misspecify dependencies and label-independent statistics implicitly. The top 2 performance scores are highlighted as First, Second. Triplet-median [11] is not listed as it only converged for IMDB with 12 LFs (F1 = 73.0
A Approximate Behavior of Metrics on Sequential Data
How do different metrics behave when used to measure autoregressive model outputs? A.1 Per-T oken Error Probability is Resolution-Limited Here, resolution refers to "the smallest interval measurable After F coin flips, we can only resolve the coin's probability of A.3), we ignore how likely the language model is to over-348 Section 3.2 of [23] gives the exact definition, but the Simulations show that as the per-token error probability slightly increase (e.g. from 0.05 to 0.1), the ROUGE-L-Sum metric sharply falls.Figure 10: Induced emergent MNIST classification ability in convolutional networks.
Fake News in Social Networks
Aymanns, Christoph, Foerster, Jakob, Georg, Co-Pierre, Weber, Matthias
We propose multi-agent reinforcement learning as a new method for modeling fake news in social networks. This method allows us to model human behavior in social networks both in unaccustomed populations and in populations that have adapted to the presence of fake news. In particular the latter is challenging for existing methods. We find that a fake-news attack is more effective if it targets highly connected people and people with weaker private information. Attacks are more effective when the disinformation is spread across several agents than when the disinformation is concentrated with more intensity on fewer agents. Furthermore, fake news spread less well in balanced networks than in clustered networks. We test a part of our findings in a human-subject experiment. The experimental evidence provides support for the predictions from the model, suggesting that the model is suitable to analyze the spread of fake news in social networks.
How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation
Yang, Hao, Zhao, Qinghua, Li, Lei
Chain-of-Thought (CoT) prompting significantly enhances model reasoning, yet its internal mechanisms remain poorly understood. We analyze CoT's operational principles by reversely tracing information flow across decoding, projection, and activation phases. Our quantitative analysis suggests that CoT may serve as a decoding space pruner, leveraging answer templates to guide output generation, with higher template adherence strongly correlating with improved performance. Furthermore, we surprisingly find that CoT modulates neuron engagement in a task-dependent manner: reducing neuron activation in open-domain tasks, yet increasing it in closed-domain scenarios. These findings offer a novel mechanistic interpretability framework and critical insights for enabling targeted CoT interventions to design more efficient and robust prompts. We released our code and data at https://anonymous.4open.science/r/cot-D247.
Here's how to generate a truly random number with quantum physics
Breakthroughs, discoveries, and DIY tips sent every weekday. Very little in this life is truly random. A coin flip is influenced by the flipper's force, its surrounding airflow, and gravity. Similar variables dictate rolling a pair of dice or shuffling a deck of cards, while even classical computing's cryptographic algorithms are theoretically susceptible to outside influence or bias. "True randomness is something that nothing in the universe can predict in advance," explained Krister Shalm, a physicist at the National Institute of Standards and Technology (NIST).
Enough Coin Flips Can Make LLMs Act Bayesian
Gupta, Ritwik, Corona, Rodolfo, Ge, Jiaxin, Wang, Eric, Klein, Dan, Darrell, Trevor, Chan, David M.
Large language models (LLMs) exhibit the ability to generalize given few-shot examples in their input prompt, an emergent capability known as in-context learning (ICL). We investigate whether LLMs utilize ICL to perform structured reasoning in ways that are consistent with a Bayesian framework or rely on pattern matching. Using a controlled setting of biased coin flips, we find that: (1) LLMs often possess biased priors, causing initial divergence in zero-shot settings, (2) in-context evidence outweighs explicit bias instructions, (3) LLMs broadly follow Bayesian posterior updates, with deviations primarily due to miscalibrated priors rather than flawed updates, and (4) attention magnitude has negligible effect on Bayesian inference. With sufficient demonstrations of biased coin flips via ICL, LLMs update their priors in a Bayesian manner.
Chain-of-Thought in Large Language Models: Decoding, Projection, and Activation
Yang, Hao, Zhao, Qianghua, Li, Lei
Chain-of-Thought prompting has significantly enhanced the reasoning capabilities of large language models, with numerous studies exploring factors influencing its performance. However, the underlying mechanisms remain poorly understood. To further demystify the operational principles, this work examines three key aspects: decoding, projection, and activation, aiming to elucidate the changes that occur within models when employing Chainof-Thought. Our findings reveal that LLMs effectively imitate exemplar formats while integrating them with their understanding of the question, exhibiting fluctuations in token logits during generation but ultimately producing a more concentrated logits distribution, and activating a broader set of neurons in the final layers, indicating more extensive knowledge retrieval compared to standard prompts. Our code and data will be publicly avialable when the paper is accepted.
Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models
Hu, Yi, Yang, Haotong, Lin, Zhouchen, Zhang, Muhan
Large language models (LLMs) have scaled up to unlock a wide range of complex reasoning tasks with the aid of various prompting methods. However, current prompting methods generate natural language intermediate steps to help reasoning, which can cause imperfect task reduction and confusion. To mitigate such limitations, we explore code prompting, a neural symbolic prompting method with both zero-shot and few-shot versions which triggers code as intermediate steps. We conduct experiments on 7 widely-used benchmarks involving symbolic reasoning and arithmetic reasoning. Code prompting generally outperforms chain-of-thought (CoT) prompting. To further understand the performance and limitations of code prompting, we perform extensive ablation studies and error analyses, and identify several exclusive advantages of using symbolic promptings compared to natural language. We also consider the ensemble of code prompting and CoT prompting to combine the strengths of both. Finally, we show through experiments how code annotations and their locations affect code prompting.